informed machine learning
Physics-Informed Machine Learning for Steel Development: A Computational Framework and CCT Diagram Modelling
Hedström, Peter, Cubero, Victor Lamelas, Sigurdsson, Jón, Österberg, Viktor, Kolli, Satish, Odqvist, Joakim, Hou, Ziyong, Mu, Wangzhong, Arigela, Viswanadh Gowtham
Machine learning (ML) has emerged as a powerful tool for accelerating the computational design and production of materials. In materials science, ML has primarily supported large-scale discovery of novel compounds using first-principles data and digital twin applications for optimizing manufacturing processes. However, applying general-purpose ML frameworks to complex industrial materials such as steel remains a challenge. A key obstacle is accurately capturing the intricate relationship between chemical composition, processing parameters, and the resulting microstructure and properties. To address this, we introduce a computational framework that combines physical insights with ML to develop a physics-informed continuous cooling transformation (CCT) model for steels. Our model, trained on a dataset of 4,100 diagrams, is validated against literature and experimental data. It demonstrates high computational efficiency, generating complete CCT diagrams with 100 cooling curves in under 5 seconds. It also shows strong generalizability across alloy steels, achieving phase classification F1 scores above 88% for all phases. For phase transition temperature regression, it attains mean absolute errors (MAE) below 20 °C across all phases except bainite, which shows a slightly higher MAE of 27 °C. This framework can be extended with additional generic and customized ML models to establish a universal digital twin platform for heat treatment. Integration with complementary simulation tools and targeted experiments will further support accelerated materials design workflows.
Replication Study: Enhancing Hydrological Modeling with Physics-Guided Machine Learning
Esmaeilzadeh, Mostafa, Amirzadeh, Melika
Current hydrological modeling methods combine data-driven Machine Learning (ML) algorithms and traditional physics-based models to address their respective limitations incorrect parameter estimates from rigid physics-based models and the neglect of physical process constraints by ML algorithms. Despite the accuracy of ML in outcome prediction, the integration of scientific knowledge is crucial for reliable predictions. This study introduces a Physics Informed Machine Learning (PIML) model, which merges the process understanding of conceptual hydrological models with the predictive efficiency of ML algorithms. Applied to the Anandapur sub-catchment, the PIML model demonstrates superior performance in forecasting monthly streamflow and actual evapotranspiration over both standalone conceptual models and ML algorithms, ensuring physical consistency of the outputs. This study replicates the methodologies of Bhasme, P., Vagadiya, J., & Bhatia, U. (2022) from their pivotal work on Physics Informed Machine Learning for hydrological processes, utilizing their shared code and datasets to further explore the predictive capabilities in hydrological modeling.
Physics-informed machine learning as a kernel method
Doumèche, Nathan, Bach, Francis, Boyer, Claire, Biau, Gérard
Physics-informed machine learning combines the expressiveness of data-based approaches with the interpretability of physical models. In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency. We prove that for linear differential priors, the problem can be formulated as a kernel regression task. Taking advantage of kernel theory, we derive convergence rates for the minimizer of the regularized risk and show that it converges at least at the Sobolev minimax rate. However, faster rates can be achieved, depending on the physical error. This principle is illustrated with a one-dimensional example, supporting the claim that regularizing the empirical risk with physical information can be beneficial to the statistical performance of estimators.
Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning
Hoffmann, Maximilian, Bergmann, Ralph
Currently, Deep Learning (DL) components within a Case-Based Reasoning (CBR) application often lack the comprehensive integration of available domain knowledge. The trend within machine learning towards so-called Informed machine learning can help to overcome this limitation. In this paper, we therefore investigate the potential of integrating domain knowledge into Graph Neural Networks (GNNs) that are used for similarity assessment between semantic graphs within process-oriented CBR applications. We integrate knowledge in two ways: First, a special data representation and processing method is used that encodes structural knowledge about the semantic annotations of each graph node and edge. Second, the message-passing component of the GNNs is constrained by knowledge on legal node mappings. The evaluation examines the quality and training time of the extended GNNs, compared to the stock models. The results show that both extensions are capable of providing better quality, shorter training times, or in some configurations both advantages at once.
Informed Machine Learning - Towards a Taxonomy of Explicit Integration of Knowledge into Machine Learning
von Rueden, Laura, Mayer, Sebastian, Garcke, Jochen, Bauckhage, Christian, Schuecker, Jannis
Despite the great successes of machine learning, it can have its limits when dealing with insufficient training data.A potential solution is to incorporate additional knowledge into the training process which leads to the idea of informed machine learning. We present a research survey and structured overview of various approaches in this field. We aim to establish a taxonomy which can serve as a classification framework that considers the kind of additional knowledge, its representation,and its integration into the machine learning pipeline. The evaluation of numerous papers on the bases of the taxonomy uncovers key methods in this field.
District Data Labs - Visual Diagnostics for More Informed Machine Learning: Part 1
How could they see anything but the shadows if they were never allowed to move their heads? Python and high level libraries like Scikit-learn, TensorFlow, NLTK, PyBrain, Theano, and MLPY have made machine learning accessible to a broad programming community that might never have found it otherwise. With the democratization of these tools, there is now a large, and growing, population of machine learning practitioners who are primarily self-taught. At the same time, the stakes of machine learning have never been higher; predictive tools are driving decision-making in every sector, from business, art, and engineering to education, law, and defense. How do we ensure our predictions are valid and robust in a time when these few lines of Python can instantiate and fit a model?
District Data Labs - Visual Diagnostics for More Informed Machine Learning: Part 2
Note: Before starting Part 2, be sure to read Part 1! When it comes to machine learning, ultimately the most important picture to have is the big picture. Whether it's logistic regression, random forests, Bayesian methods, support vector machines, or neural nets, everyone seems to have their favorite! Unfortunately these discussions tend to truncate the challenges of machine learning into a single problem, which is a particularly problematic misrepresentation for people who are just getting started with machine learning. Sure, picking a good model is important, but it's certainly not enough (and it's debatable whether a model can actually be'good' devoid of the context of the domain, the hypothesis, the shape of the data, and the intended application. In this post we'll discuss model selection in the context of the big picture, which I'll present in terms of the model selection triple, and we'll explore a set of visual tools for navigating the triple.
District Data Labs - Visual Diagnostics for More Informed Machine Learning: Part 3
Note: Before starting Part 3, be sure to read Part 1 and Part 2! In this final installment of Visual Diagnostics for More Informed Machine Learning, we'll close the loop on visualization tools for navigating the different phases of the machine learning workflow. Recall that we are framing the workflow in terms of the'model selection triple' -- this includes analyzing and selecting features, experimenting with different model forms, and evaluating and tuning fitted models. So far, we've covered methods for visual feature analysis in Part 1 and methods for model family and form exploration in Part 2. This post will cover evaluation and tuning, so we'll begin with two questions: You've probably heard other machine learning practitioners talking about their F1 scores or their R-Squared value. Generally speaking, we do tend to rely on numeric scores to tell us when our models are performing well or poorly. There are a number of measures we can use to evaluate our fitted models.